Title: Detecting Misaligned and Missing Concepts in SNOMED CT using Structural and Lexical Patterns
نویسندگان
چکیده
Objective: Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of novel anomalies in SNOMED CT. The structural component is based on shared isa relations to other concepts. The lexical component leverages shared words in descriptions between concepts. Material and Methods: All non-lattice subgraphs (the structural part) in SNOMED CT are exhaustively extracted. Four types of lexical patterns (the lexical part) are identified among the concepts involved in non-lattice subgraphs. Non-lattice subgraphs exhibiting such lexical patterns are often indicative of misaligned and missing concepts. Results: Applying our hybrid structural-lexical method to the September 2015 version of SNOMED CT (U.S. edition), we extracted 171,011 non-lattice subgraphs, among which 6,801 matched the lexical patterns. A subset of 2,046 small non-lattice subgraphs with sizes 4 to 6 with lexical patterns was obtained. A random sample of 100 of these subgraphs was selected, visualized and manually reviewed by two domain experts. Of these, 59 (59%) revealed errors confirmed by the experts. The most frequent type of error was missing isa relations due to incomplete or inconsistent modeling of the concepts. Discussion: The combined non-lattice and lexical-based anomalies have not been uncovered by other existing ontology quality assurance approaches known to date. Non-lattice subgraphs of sizes 4, 5 and 6 can be easily visualized for manual inspection by experts. It also makes sense to investigate them first, because they are often included in larger subgraphs. Conclusions: Our hybrid structural-lexical method is innovative and effective in detecting SNOMEDCT anomalies that have escaped existing quality assurance processes. References1. Zhang GQ, Zhu W, Sun M, Tao S, Bodenreider O, Cui L. MaPLE: A MapReduce Pipeline for Lattice-basedEvaluation and Its Application to SNOMED CT. Proc IEEE Int Conf Big Data. 2014 Oct;2014:754-759. (PMID:25705725)
منابع مشابه
Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT
Objective Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of missing hierarchical relations and concepts in SNOMED CT. Material and Methods All non-lattice subgraphs (the structural part) in SNOMED CT are exhaustively extracte...
متن کاملIdentifying Missing Hierarchical Relations in SNOMED CT from Logical Definitions Based on the Lexical Features of Concept Names
Objectives. To identify missing hierarchical relations in SNOMED CT from logical definitions based on the lexical features of concept names. Methods. We first create logical definitions from the lexical features of concept names, which we represent in OWL EL. We infer hierarchical (subClassOf) relations among these concepts using the ELK reasoner. Finally, we compare the hierarchy obtained from...
متن کاملAuditing SNOMED CT hierarchical relations based on lexical features of concepts in non-lattice subgraphs
OBJECTIVE We introduce a structural-lexical approach for auditing SNOMED CT using a combination of non-lattice subgraphs of the underlying hierarchical relations and enriched lexical attributes of fully specified concept names. Our goal is to develop a scalable and effective approach that automatically identifies missing hierarchical IS-A relations. METHODS Our approach involves 3 stages. In ...
متن کاملبررسی تطبیقی سیر تکامل و ساختار سیستم های نامگذاری نظام یافته پزشکی SNOMED در کشورهای آمریکا ، انگلستان و استرالیا 86-85
Background and Aim: Systematized Nomenclature of Medicine systems are the important supportive for electronic health record in registration and retrieval of data. Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT) is the most comprehensive language and then the consistency of exchanged data across health care providers and finally the high effectiveness of health care. Material...
متن کاملIdentifying Potentially Missing Hierarchical Relations in SNOMED CT based on Lexical Features - Impact of Synonyms and Lexico-syntactic Constraints
Introduction The quality assurance of large bio-ontologies is extremely critical for their effective and continued use and is an active area of research1. For example, recent investigations highlighted issues in the hierarchical structure of SNOMED CT and its detrimental effects on biomedical applications2. Previous work by one of the authors3 established a method to identify potentially missin...
متن کامل